Evaluation at Large Scale FP 7 – 238975 D 13 . 1 Evaluation design and collection of test data for semantic
نویسندگان
چکیده
for dissemination) This deliverable presents a systematic procedure for the evaluation of semantic search tools within the SEALS framework. A review of current search tool technologies and previous evaluation efforts is presented which informs the approach adopted later in the document. The document describes how the SEALS evaluation methodology will be applied by describing the criteria by which the tools will be assessed and identifying the specific metrics and interpretations that will be employed. In addition, we describe a number of potential data sets and identify those which are suitable for our evaluation requirements. A set of reference benchmark tests are described for assessing the strengths and weaknesses of the available tools and to compare them. As such, these tests focus on the performance of fundamental aspects of the tool in a strictly controlled environment / scenario rather than their ability to solve open-ended, real-life problems. These fundamental aspects are the formal evaluation criterion by which we will benchmark each tool: • Query expressiveness • Usability (effectiveness, efficiency, satisfaction) • Scalability • Interoperability • Suitability for the Semantic Web • Quality of documentation In common with the evaluation procedure employed by many of the other tool type evaluation teams within SEALS, an automated evaluation approach will be adopted. However, in order to draw useful conclusions regarding the usability of a tool, it is essential that real users are asked to complete tasks using the tool. Therefore, for the purposes of evaluating semantic search tools, we adopt a two phase approach: an automated phase and a user-in-the-loop phase. The user-in-the-loop phase involves a series of experiments involving human subjects (conducted by the tool provider) who are given a number of tasks (questions) to solve and a particular tool and ontology with which to do it. Since the evaluation consists of two distinct phases each addressing different criteria, each phase uses a different data set to ensure optimal applicability to each task: • Mooney (user-in-the-loop) • EvoOnt (automated) The Mooney data set is well established in semantic search tool evaluations and has an appropriate set of questions for which the groundtruth is known. It is an ideal data set for investigating the usability of a tool interface. However, its limited size means it is less well suited to criteria such as scalability. Therefore, the EvoOnt dataset which can be significantly larger than the Mooney data set is used. User-in-the-loop user evaluations necessitate the provision of additional software above and beyond that provided by the SEALS consortium as a whole. Additional software is required to both run the experimental workflows and also obtain test data and return results data to the various SEALS repositories.
منابع مشابه
SEALS Semantic Evaluation at Large Scale FP 7 – 238975 D 12 . 1 Evaluation Design and Collection of Test Data for Matching Tools
for dissemination) This deliverable presents a systematic procedure for evaluating ontology matching systems and algorithms, in the context of SEALS project. It describes the criteria and metrics on which the evaluations will be carried out and the characteristics of the test data to be used, as well as the evaluation target, which includes the systems generating the alignments for evaluation.
متن کاملSemantic Evaluation at Large Scale FP 7 – 238975 D 11 . 1 Evaluation design and collection of test data for advanced reasoning systems
for dissemination) This deliverable reviews the current state on advanced reasoning systems evaluation and presents the definition of the evaluations and test data that will be used in the first SEALS Evaluation Campaign. The tests cover interoperability and performance of advanced reasoning systems.
متن کاملSEALS Semantic Evaluation at Large Scale FP 7 – 238975 D 10 . 1 Evaluation Design and Collection of Test Data for Evaluating Ontology Engineering
for dissemination) This deliverable reviews the current state on ontology engineering tool evaluation and presents the definition of the evaluations and test data that will be used in the first SEALS Evaluation Campaign over ontology engineering tools, which will target the conformance, interoperability and scalability of such tools.
متن کاملSEALS Semantic Evaluation at Large Scale FP 7 – 238975 D 12 . 3 Results of the first evaluation of matching tools
for dissemination) This deliverable reports the results of the first SEALS evaluation campaign, which has been carried out in coordination with the OAEI 2010 campaign. A subset of the OAEI tracks has been included in a new modality, the SEALS modality. From the participant’s point of view, the main innovation is the use of a web-based interface for launching evaluations. 13 systems, out of 15 f...
متن کاملLarge Scale FP 7 – 238975 D 12 . 5 Iterative implementation of services for the automatic evaluation of matching tools - v 2 . 0 - FR
for dissemination) This deliverable reports on the current status of the service implementation for the automatic evaluation of matching tools, and on the final status of those services. These services have been used in the third SEALS evaluation of matching systems, held in Spring 2012 in coordination with the OAEI 2011.5 campaign. We worked mainly on the tasks of modifying the WP12 BPEL work-...
متن کامل